fused lasso
T-LoHo: ABayesian Regularization Model for Structured Sparsity and Smoothness on Graphs
Graphs have been commonly used to represent complex data structures. In models dealing with graph-structured data, multivariate parameters may not only exhibit sparse patterns but have structured sparsity and smoothness in the sense that both zero and non-zero parameters tend to cluster together. We propose a new prior for high-dimensional parameters with graphical relations, referred to as the Treebased Low-rank Horseshoe (T-LoHo) model, that generalizes the popular univariate Bayesian horseshoe shrinkage prior to the multivariate setting to detect structured sparsity and smoothness simultaneously. The T-LoHo prior can be embedded in many high-dimensional hierarchical models. To illustrate its utility, we apply it to regularize a Bayesian high-dimensional regression problem where the regression coefficients are linked by a graph, so that the resulting clusters have flexible shapes and satisfy the cluster contiguity constraint with respect to the graph. We design an efficient Markov chain Monte Carlo algorithm that delivers full Bayesian inference with uncertainty measures for model parameters such as the number of clusters. We offer theoretical investigations of the clustering effects and posterior concentration results. Finally, we illustrate the performance of the model with simulation studies and a real data application for anomaly detection on a road network. The results indicate substantial improvements over other competing methods such as the sparse fused lasso.
A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening
In the 1-dimensional multiple changepoint detection problem, we derive a new fast error rate for the fused lasso estimator, under the assumption that the mean vector has a sparse number of changepoints. This rate is seen to be suboptimal (compared to the minimax rate) by only a factor of $\log\log{n}$. Our proof technique is centered around a novel construction that we call a lower interpolant. We extend our results to misspecified models and exponential family distributions. We also describe the implications of our error analysis for the approximate screening of changepoints.
Spacing Test for Fused Lasso
Tasaka, Rieko, Kimura, Tatsuya, Suzuki, Joe
Detecting changepoints in a one-dimensional signal is a classical yet fundamental problem. The fused lasso provides an elegant convex formulation that produces a stepwise estimate of the mean, but quantifying the uncertainty of the detected changepoints remains difficult. Post-selection inference (PSI) offers a principled way to compute valid $p$-values after a data-driven selection, but its application to the fused lasso has been considered computationally cumbersome, requiring the tracking of many ``hit'' and ``leave'' events along the regularization path. In this paper, we show that the one-dimensional fused lasso has a surprisingly simple geometry: each changepoint enters in a strictly one-sided fashion, and there are no leave events. This structure implies that the so-called \emph{conservative spacing test} of Tibshirani et al.\ (2016), previously regarded as an approximation, is in fact \emph{exact}. The truncation region in the selective law reduces to a single lower bound given by the next knot on the LARS path. As a result, the exact selective $p$-value takes a closed form identical to the simple spacing statistic used in the LARS/lasso setting, with no additional computation. This finding establishes one of the rare cases in which an exact PSI procedure for the generalized lasso admits a closed-form pivot. We further validate the result by simulations and real data, confirming both exact calibration and high power. Keywords: fused lasso; changepoint detection; post-selection inference; spacing test; monotone LASSO
pared: Model selection using multi-objective optimization
Das, Priyam, Robinson, Sarah, Peterson, Christine B.
Motivation: Model selection is a ubiquitous challenge in statistics. For penalized models, model selection typically entails tuning hyperparameters to maximize a measure of fit or minimize out-of-sample prediction error. However, these criteria fail to reflect other desirable characteristics, such as model sparsity, interpretability, or smoothness. Results: We present the R package pared to enable the use of multi-objective optimization for model selection. Our approach entails the use of Gaussian process-based optimization to efficiently identify solutions that represent desirable trade-offs. Our implementation includes popular models with multiple objectives including the elastic net, fused lasso, fused graphical lasso, and group graphical lasso. Our R package generates interactive graphics that allow the user to identify hyperparameter values that result in fitted models which lie on the Pareto frontier.
Minmax Trend Filtering: A Locally Adaptive Nonparametric Regression Method via Pointwise Min Max Optimization
Trend Filtering is a nonparametric regression method which exhibits local adaptivity, in contrast to a host of classical linear smoothing methods. However, there seems to be no unanimously agreed upon definition of local adaptivity in the literature. A question we seek to answer here is how exactly is Fused Lasso or Total Variation Denoising, which is Trend Filtering of order $0$, locally adaptive? To answer this question, we first derive a new pointwise formula for the Fused Lasso estimator in terms of min-max/max-min optimization of penalized local averages. This pointwise representation appears to be new and gives a concrete explanation of the local adaptivity of Fused Lasso. It yields that the estimation error of Fused Lasso at any given point is bounded by the best (local) bias variance tradeoff where bias and variance have a slightly different meaning than usual. We then propose higher order polynomial versions of Fused Lasso which are defined pointwise in terms of min-max/max-min optimization of penalized local polynomial regressions. These appear to be new nonparametric regression methods, different from any existing method in the nonparametric regression toolbox. We call these estimators Minmax Trend Filtering. They continue to enjoy the notion of local adaptivity in the sense that their estimation error at any given point is bounded by the best (local) bias variance tradeoff.
Untangling Lariats: Subgradient Following of Variationally Penalized Objectives
Mo, Kai-Chia, Shalev-Shwartz, Shai, Shártov, Nisæl
We describe a novel subgradient following apparatus for calculating the optimum of convex problems with variational penalties. In this setting, we receive a sequence $y_i,\ldots,y_n$ and seek a smooth sequence $x_1,\ldots,x_n$. The smooth sequence attains the minimum Bregman divergence to an input sequence with additive variational penalties in the general form of $\sum_i g_i(x_{i+1}-x_i)$. We derive, as special cases of our apparatus, known algorithms for the fused lasso and isotonic regression. Our approach also facilitates new variational penalties such as non-smooth barrier functions. We next derive and analyze multivariate problems in which $\mathbf{x}_i,\mathbf{y}_i\in\mathbb{R}^d$ and variational penalties that depend on $\|\mathbf{x}_{i+1}-\mathbf{x}_i\|$. The norms we consider are $\ell_2$ and $\ell_\infty$ which promote group sparsity. Last but not least, we derive a lattice-based subgradient following for variational penalties characterized through the output of arbitrary convolutional filters. This paradigm yields efficient solvers for problems in which sparse high-order discrete derivatives such as acceleration and jerk are desirable.